Optimize vector::assign for InputIterator-only pair inputs #113852

winner245 · 2024-10-28T02:16:45Z

This PR optimizes the input iterator overload of assign(_InputIterator, _InputIterator) in std::vector<_Tp, _Allocator> by directly assigning to already initialized memory, rather than first destroying existing elements and then constructing new ones. By eliminating unnecessary destruction and construction, the proposed algorithm enhances the performance by up to 2x for trivial element types (e.g., std::vector<int>), up to 2.6x for non-trivial element types like std::vector<std::string>, and up to 3.4x for more complex non-trivial types (e.g., std::vector<std::vector<int>>).

Google Benchmarks

Benchmark tests (libcxx/test/benchmarks/vector_operations.bench.cpp) were conducted for the assign() implementations before and after this patch. The tests focused on trivial element types like std::vector<int>, and non-trivial element types such as std::vector<std::string> and std::vector<std::vector<int>>.

Before

-------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations
-------------------------------------------------------------------------------------------------
BM_AssignInputIterIter/vector_int/1024/1024                  1157 ns         1169 ns       608188
BM_AssignInputIterIter<32>/vector_string/1024/1024          14559 ns        14710 ns        47277
BM_AssignInputIterIter<32>/vector_vector_int/1024/1024      26846 ns        27129 ns        25925

After

-------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations
-------------------------------------------------------------------------------------------------
BM_AssignInputIterIter/vector_int/1024/1024                   561 ns          566 ns      1242251
BM_AssignInputIterIter<32>/vector_string/1024/1024           5604 ns         5664 ns       128365
BM_AssignInputIterIter<32>/vector_vector_int/1024/1024       7927 ns         8012 ns        88579

llvmbot · 2024-10-28T02:17:19Z

@llvm/pr-subscribers-libcxx

Author: Peng Liu (winner245)

Changes

Summary

This PR optimizes the vector::__assign_with_sentinel function by reusing existing memory more effectively, resulting in improved performance.

Details

Memory Reuse: The new implementation reuses the memory by directly assigning to the already initialized memory, instead of destructing all existing elements. Destruction now only occurs when the original vector has more elements than the input iterator range. By avoiding unnecessary destruction of existing elements, the new implementation potentially avoids memory deallocation for element types that maintain resources, allowing for memory reuse. This reduction in overhead leads to performance improvements. The new implementation is particularly beneficial for pre-populated vectors, resulting in 2.1x performance gains.

Testing

Benchmark tests (Quick-Bench Results) show significant performance improvements for test cases with pre-populated elements where the vector sizes are about the same before and after assignment:

1000 -> 1000: ~2.1x faster
1000 -> 1: roughly the same
1 -> 1000: roughly the same

where m -> n represent the size change from m to n due to assignment.

Full diff: https://github.com/llvm/llvm-project/pull/113852.diff

1 Files Affected:

(modified) libcxx/include/__vector/vector.h (+8-3)

diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..6c37c3113a536a 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1031,9 +1031,14 @@ template <class _Tp, class _Allocator>
 template <class _Iterator, class _Sentinel>
 _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
 vector<_Tp, _Allocator>::__assign_with_sentinel(_Iterator __first, _Sentinel __last) {
-  clear();
-  for (; __first != __last; ++__first)
-    emplace_back(*__first);
+  pointer __cur = __begin_;
+  for (; __first != __last && __cur != __end_; ++__cur, ++__first)
+    *__cur = *__first;
+  if (__cur != __end_)
+    __destruct_at_end(__cur);
+  else
+    for (; __first != __last; ++__first)
+      emplace_back(*__first);
 }
 
 template <class _Tp, class _Allocator>

libcxx/include/__vector/vector.h

ldionne

LGTM with minor comments. This is great!

libcxx/include/__vector/vector.h

ldionne · 2024-11-11T18:07:44Z

libcxx/test/benchmarks/ContainerBenchmarks.h

    c1 = c2;
    DoNotOptimizeData(c1);
    DoNotOptimizeData(c2);
  }


Not attached to this line: Can you please add a release note to 20.rst mentioning this optimization?

Thank you for your positive feedback and recognition of my work! I appreciate your time and effort in reviewing this PR. I have added a description of this performance optimization to the release notes and rebased the PR onto the main branch. Thanks again for your help and support!

libcxx/test/benchmarks/ContainerBenchmarks.h

ldionne

LGTM!

It would be great if @philnik777 or @frederick-vs-ja could also have a look to make sure I didn't miss something related to conformance.

philnik777

The implementation itself LGTM, but I think we want to rework the benchmarks. I'd also like to see the actual benchmark results and would rather have some text in the commit message than graphics, since I don't think the graphics will show anywhere except on GitHub.

libcxx/docs/ReleaseNotes/20.rst

libcxx/include/__vector/vector.h

libcxx/test/benchmarks/ContainerBenchmarks.h

github-actions · 2024-11-12T16:07:42Z

✅ With the latest revision this PR passed the C/C++ code formatter.

libcxx/test/benchmarks/ContainerBenchmarks.h

ldionne

The code changes still LGTM, I have a few comments on the benchmarks and most importantly I'd like @philnik777 to chime in to say whether he's satisfied with the benchmarks, since he had requested some changes.

ldionne · 2024-11-26T21:55:20Z

libcxx/test/benchmarks/GenerateInput.h

+}
+
+template <class IntT>
+inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len


Suggested change

inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len

std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len

inline not needed since this is a template.

libcxx/test/benchmarks/vector_operations.bench.cpp

ldionne

This LGTM, but please wait for @philnik777 to stamp this since he had comments.

winner245 · 2024-11-28T19:36:19Z

@philnik777 Thank you for the approval! I appreciate your time.

As a follow-up to #113852, this PR optimizes the performance of the `insert(const_iterator pos, InputIt first, InputIt last)` function for `input_iterator`-pair inputs in `std::vector` for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditional `try-catch` mechanism with a modern exception guard for the `insert` function. The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged. Previous implementation ----------------------- The previous implementation of `insert` is inefficient in reallocation scenarios because it performs the following steps separately: - `reserve()`: This leads to the first round of relocating old elements to new memory; - `rotate()`: This leads to the second round of reorganizing the existing elements; - Move-forward: Moves the elements after the insertion position to their final positions. - Insert: performs the actual insertion. This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions. Proposed implementation ----------------------- The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call `std::rotate`, thereby significantly improving overall performance.

winner245 requested a review from a team as a code owner October 28, 2024 02:16

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 28, 2024

ldionne added the performance label Oct 31, 2024

ldionne reviewed Nov 1, 2024

View reviewed changes

libcxx/include/__vector/vector.h Show resolved Hide resolved

winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 9f5bbf5 to 83d79b3 Compare November 7, 2024 20:24

ldionne approved these changes Nov 11, 2024

View reviewed changes

winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 83d79b3 to 03b8721 Compare November 11, 2024 19:46

ldionne approved these changes Nov 11, 2024

View reviewed changes

philnik777 requested changes Nov 11, 2024

View reviewed changes

winner245 force-pushed the winner245/vec_assign_with_sentinel branch from c5b2e3f to 87f94b9 Compare November 12, 2024 16:03

winner245 changed the title ~~Optimize __assign_with_sentinel in std::vector~~ Optimize input iterator overload of std::vector::assign(first, last) Nov 14, 2024

winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 549ba00 to 22e78f4 Compare November 14, 2024 03:43

philnik777 reviewed Nov 15, 2024

View reviewed changes

libcxx/test/benchmarks/ContainerBenchmarks.h Outdated Show resolved Hide resolved

winner245 force-pushed the winner245/vec_assign_with_sentinel branch 2 times, most recently from a2e4ff3 to 6112450 Compare November 17, 2024 20:51

ldionne reviewed Nov 26, 2024

View reviewed changes

winner245 changed the title ~~Optimize input iterator overload of std::vector::assign(first, last)~~ Optimize std::vector::assign for InputIterator-pair inputs Nov 28, 2024

winner245 changed the title ~~Optimize std::vector::assign for InputIterator-pair inputs~~ Optimize vector::assign for InputIterator-only pair inputs Nov 28, 2024

winner245 added 8 commits November 28, 2024 10:57

Improve __assign_with_sentinel in std::vector

a7aa7c5

Avoid invoking operator,

75eab99

Add release note to this optimization in 20.rst

c131b5c

Restructure benchmark tests

52121af

Run clang-format

08abdb1

Update release note

c9c6fb9

Refactor tests

91eb3be

Rebase and a little refactor

4fa53ef

winner245 force-pushed the winner245/vec_assign_with_sentinel branch from 6112450 to 4fa53ef Compare November 28, 2024 16:16

ldionne approved these changes Nov 28, 2024

View reviewed changes

philnik777 approved these changes Nov 28, 2024

View reviewed changes

philnik777 merged commit 056153f into llvm:main Nov 28, 2024
62 checks passed

winner245 deleted the winner245/vec_assign_with_sentinel branch November 28, 2024 21:30

winner245 mentioned this pull request Dec 20, 2024

Optimize input_iterator-pair insert for std::vector #113768

Merged

	inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len
	std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len

Optimize vector::assign for InputIterator-only pair inputs #113852

Optimize vector::assign for InputIterator-only pair inputs #113852

Uh oh!

Conversation

winner245 commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Google Benchmarks

Before

After

Uh oh!

llvmbot commented Oct 28, 2024

Summary

Details

Testing

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldionne Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

winner245 Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

philnik777 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

ldionne Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

winner245 Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

winner245 commented Nov 28, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

winner245 commented Oct 28, 2024 •

edited

Loading

github-actions bot commented Nov 12, 2024 •

edited

Loading